AITopics

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Neural Information Processing SystemsDec-25-2025, 11:36:56 GMT

Solving Interpretable Kernel Dimensionality Reduction

Kernel dimensionality reduction (KDR) algorithms find a low dimensional representation of the original data by optimizing kernel dependency measures that are capable of capturing nonlinear relationships. The standard strategy is to first map the data into a high dimensional feature space using kernels prior to a projection onto a low dimensional space. While KDR methods can be easily solved by keeping the most dominant eigenvectors of the kernel matrix, its features are no longer easy to interpret. Alternatively, Interpretable KDR (IKDR) is different in that it projects onto a subspace \textit{before} the kernel feature mapping, therefore, the projection matrix can indicate how the original features linearly combine to form the new features. Unfortunately, the IKDR objective requires a non-convex manifold optimization that is difficult to solve and can no longer be solved by eigendecomposition.

interpretable kernel dimensionality reduction, name change, proceedings, (9 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Zhang, Kaicheng, Deidda, Piero, Higham, Desmond, Tudisco, Francesco

Rethinking Oversmoothing in Graph Neural Networks: A Rank-Based Perspective

arXiv.org Machine LearningFeb-16-2025

Oversmoothing is a fundamental challenge in graph neural networks (GNNs): as the number of layers increases, node embeddings become increasingly similar, and model performance drops sharply. Traditionally, oversmoothing has been quantified using metrics that measure the similarity of neighbouring node features, such as the Dirichlet energy. While these metrics are related to oversmoothing, we argue they have critical limitations and fail to reliably capture oversmoothing in realistic scenarios. For instance, they provide meaningful insights only for very deep networks and under somewhat strict conditions on the norm of network weights and feature representations. As an alternative, we propose measuring oversmoothing by examining the numerical or effective rank of the feature representations. We provide theoretical support for this approach, demonstrating that the numerical rank of feature representations converges to one for a broad family of nonlinear activation functions under the assumption of nonnegative trained weights. To the best of our knowledge, this is the first result that proves the occurrence of oversmoothing without assumptions on the boundedness of the weight matrices. Along with the theoretical findings, we provide extensive numerical evaluation across diverse graph architectures. Our results show that rank-based metrics consistently capture oversmoothing, whereas energy-based metrics often fail. Notably, we reveal that a significant drop in the rank aligns closely with performance degradation, even in scenarios where energy metrics remain unchanged.

artificial intelligence, machine learning, matrix, (14 more...)

arXiv.org Machine Learning

2502.04591

Country:

Europe > Italy (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

arXiv.org Artificial IntelligenceJan-25-2025

Difference vs. Quotient: A Novel Algorithm for Dominant Eigenvalue Problem

Liu, Xiaozhi, Xia, Yong

The computation of the dominant eigenvector of symmetric positive semidefinite matrices is a cornerstone operation in numerous machine learning applications. Traditional approaches predominantly rely on the constrained Quotient formulation, which underpins most existing methods. However, these methods often suffer from challenges related to computational efficiency and dependence on spectral prior knowledge. This paper introduces a novel perspective by reformulating the eigenvalue problem using an unconstrained Difference formulation. This new approach sheds light on classical methods, revealing that the power method can be interpreted as a specific instance of Difference of Convex Algorithms. Building on this insight, we develop a generalized family of Difference-Type methods, which encompasses the power method as a special case. Within this family, we propose the Split-Merge algorithm, which achieves maximal acceleration without spectral prior knowledge and operates solely through matrix-vector products, making it both efficient and easy to implement. Extensive empirical evaluations on both synthetic and real-world datasets highlight that the Split-Merge algorithm achieves over a $\boldsymbol{10\times}$ speedup compared to the basic power method, offering significant advancements in efficiency and practicality for large-scale machine learning problems.

artificial intelligence, machine learning, optimization problem, (16 more...)

2501.15131

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Neural Information Processing SystemsOct-10-2024, 04:16:21 GMT

Solving Interpretable Kernel Dimensionality Reduction

interpretable kernel dimensionality reduction, ism, objective, (6 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (0.63)

Neural Information Processing SystemsMar-12-2024, 14:14:29 GMT

Coordinate-wise Power Method Qi Lei

In this paper, we propose a coordinate-wise version of the power method from an optimization viewpoint. The vanilla power method simultaneously updates all the coordinates of the iterate, which is essential for its convergence analysis. However, different coordinates converge to the optimal value at different speeds. Our proposed algorithm, which we call coordinate-wise power method, is able to select and update the most important k coordinates in O(kn) time at each iteration, where n is the dimension of the matrix and k apple n is the size of the active set. Inspired by the "greedy" nature of our method, we further propose a greedy coordinate descent algorithm applied on a non-convex objective function specialized for symmetric matrices. We provide convergence analyses for both methods. Experimental results on both synthetic and real data show that our methods achieve up to 23 times speedup over the basic power method. Meanwhile, due to their coordinate-wise nature, our methods are very suitable for the important case when data cannot fit into memory. Finally, we introduce how the coordinatewise mechanism could be applied to other iterative methods that are used in machine learning.

algorithm, dominant eigenvector, power method, (10 more...)

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > Ontario > Toronto (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Lebeau, Hugo, Chatelain, Florent, Couillet, Romain

Asymptotic Gaussian Fluctuations of Eigenvectors in Spectral Clustering

arXiv.org Machine LearningFeb-19-2024

The performance of spectral clustering relies on the fluctuations of the entries of the eigenvectors of a similarity matrix, which has been left uncharacterized until now. In this letter, it is shown that the signal $+$ noise structure of a general spike random matrix model is transferred to the eigenvectors of the corresponding Gram kernel matrix and the fluctuations of their entries are Gaussian in the large-dimensional regime. This CLT-like result was the last missing piece to precisely predict the classification performance of spectral clustering. The proposed proof is very general and relies solely on the rotational invariance of the noise. Numerical experiments on synthetic and real data illustrate the universality of this phenomenon.

eigenvector, matrix, spectral, (13 more...)

arXiv.org Machine Learning

2402.12302

Country:

Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.06)
North America > United States > Ohio > Cuyahoga County > Shaker Heights (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(4 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

arXiv.org Artificial IntelligenceSep-10-2023

Massively Scalable Inverse Reinforcement Learning in Google Maps

Barnes, Matt, Abueg, Matthew, Lange, Oliver F., Deeds, Matt, Trader, Jason, Molitor, Denali, Wulfmeier, Markus, O'Banion, Shawn

Optimizing for humans' latent preferences remains a grand challenge in route recommendation. Prior research has provided increasingly general techniques based on inverse reinforcement learning (IRL), yet no approach has been successfully scaled to world-sized routing problems with hundreds of millions of states and demonstration trajectories. In this paper, we provide methods for scaling IRL using graph compression, spatial parallelization, and problem initialization based on dominant eigenvectors. We revisit classic algorithms and study them in a large-scale setting, and make the key observation that there exists a trade-off between the use of cheap, deterministic planners and expensive yet robust stochastic policies. We leverage this insight in Receding Horizon Inverse Planning (RHIP), a new generalization of classic IRL algorithms that provides fine-grained control over performance trade-offs via its planning horizon. Our contributions culminate in a policy that achieves a 16-24% improvement in global route quality, and to the best of our knowledge, represents the largest instance of IRL in a real-world setting to date. Benchmark results show critical benefits to more sustainable modes of transportation, where factors beyond journey time play a substantial role. We conclude by conducting an ablation study of key components, presenting negative results from alternative eigenvalue solvers, and identifying opportunities to further improve scalability via IRL-specific batching strategies.

inverse reinforcement learning, learning, reinforcement learning, (15 more...)

2305.1129

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
Asia > Philippines > Luzon > National Capital Region > City of Manila (0.04)
(17 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games > Computer Games (0.67)
Transportation > Ground > Road (0.46)
Information Technology > Services (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Scholkemper, Michael, Schaub, Michael

Blind Extraction of Equitable Partitions from Graph Signals

arXiv.org Artificial IntelligenceMar-10-2022

Finding equitable partitions is closely related to the extraction of graph symmetries and of interest in a variety of applications context such as node role detection, cluster synchronization, consensus dynamics, and network control problems. In this work we study a blind identification problem in which we aim to recover an equitable partition of a network without the knowledge of the network's edges but based solely on the observations of the outputs of an unknown graph filter. Specifically, we consider two settings. First, we consider a scenario in which we can control the input to the graph filter and present a method to extract the partition inspired by the well known Weisfeiler-Lehman (color refinement) algorithm. Second, we generalize this idea to a setting where only observe the outputs to random, low-rank excitations of the graph filter, and present a simple spectral algorithm to extract the relevant equitable partitions. Finally, we establish theoretical bounds on the error that this spectral detection scheme incurs and perform numerical experiments that illustrate our theoretical results and compare both algorithms.

algorithm, eigenvector, graph, (15 more...)

doi: 10.1109/ICASSP43922.2022.9746676

2203.05407

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.05)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Germany > North Rhine-Westphalia (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Communications > Networks (0.66)

Gang, Arpita, Bajwa, Waheed U.

FAST-PCA: A Fast and Exact Algorithm for Distributed Principal Component Analysis

arXiv.org Artificial IntelligenceFeb-15-2022

Principal Component Analysis (PCA) is a fundamental data preprocessing tool in the world of machine learning. While PCA is often thought of as a dimensionality reduction method, the purpose of PCA is actually two-fold: dimension reduction and uncorrelated feature learning. Furthermore, the enormity of the dimensions and sample size in the modern day datasets have rendered the centralized PCA solutions unusable. In that vein, this paper reconsiders the problem of PCA when data samples are distributed across nodes in an arbitrarily connected network. While a few solutions for distributed PCA exist, those either overlook the uncorrelated feature learning aspect of the PCA, tend to have high communication overhead that makes them inefficient and/or lack `exact' or `global' convergence guarantees. To overcome these aforementioned issues, this paper proposes a distributed PCA algorithm termed FAST-PCA (Fast and exAct diSTributed PCA). The proposed algorithm is efficient in terms of communication and is proven to converge linearly and exactly to the principal components, leading to dimension reduction as well as uncorrelated features. The claims are further supported by experimental results.

artificial intelligence, eigenvector, machine learning, (19 more...)

doi: 10.1109/TSP.2022.3229635

2108.12373

Country:

North America > United States > New York (0.04)
North America > United States > New Jersey > Middlesex County > New Brunswick (0.04)
North America > United States > Maryland > Baltimore (0.04)
Europe > United Kingdom > England > East Sussex > Brighton (0.04)

Genre: Research Report (0.81)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Principal Component Analysis (0.60)